perm filename CHAP6[4,KMC]13 blob
sn#061026 filedate 1973-08-23 generic text, type T, neo UTF8
00100 VALIDATION
00200
00300 6.1 SOME TESTS
00400
00500 The term "validate" derives from the Latin VALIDUS= strong.
00600 Thus to validate X means to strengthen it. In science it usually
00700 means to strengthen X's acceptability as a hypothesis, theory , or
00800 model. To validate is to carry out procedures which show to what
00900 degree X, or its consequences, correspond with facts of observation.
01000 In the case of an interactive simulation model we can compare samples
01100 of the model's I-O pairs with samples of I-O pairs from the model`s
01200 subject.
01300 Since samples of I-O behavior from the model and its subject
01400 are being compared, one can always question whether the human sample
01500 is a "good" one, i.e.representative of the process being modelled.
01600 Assuming that it has been so judged, discrepancies in the comparison
01700 reveal what is not sufficiently understood and must be modified in
01800 the model. After modifications are carried out, a fresh comparison is
01900 made and repeated cycles are made through this process in attempts to
02000 gain convergence. Such a validation procedure characterizes a
02100 progressive (in contrast to a stationary) research program.
02200 Once a simulation model reaches a stage of intuitive
02300 adequacy, its builder should consider using more stringent evaluation
02400 procedures relevant to the model's purposes. For example, if the
02500 model is to serve as a as a training device, then a simple evaluation
02600 of its pedagogic effectiveness would be sufficient. But when the
02700 model is proposed as an explantion of a symbolic process, more is
02800 demanded of the evaluation procedure. In the area of simulation
02900 models, Turing's test has often been suggested as a validation
03000 procedure. (Abelson,1968).
03100 It is very easy to become confused about Turing's Test. In
03200 part this is due to Turing himself who introduced the now-famous
03300 imitation game in a paper entitled COMPUTING MACHINERY AND
03400 INTELLIGENCE (Turing,1950). A careful reading of this paper reveals
03500 there are actually two imitation games , the second of which is
03600 commonly called Turing's test.
03700 In the first imitation game two groups of judges try to
03800 determine which of two interviewees is a woman when one is a woman
03900 and the other is either (a) a man, or (b) a computer. Communication
04000 between judge and interviewee is by teletype. Each judge is
04100 initially informed that one of the interviewees is a woman and one a
04200 man who will pretend to be a woman. After the interview, judges are
04300 asked the " woman-question" i.e. which interviewee was the woman?
04400 Turing does not say what else is told to the judge but one can assume
04500 the judge is NOT told that a computer is involved nor is he asked to
04600 determine which interviewee is human and which is the computer. Thus,
04700 the first group of judges interviews two interviewees: a woman,
04800 and a man pretending to be a woman.
04900 The second group of judges is given the same initial
05000 instructions, but unbeknownst to them, the two interviewees are a
05100 woman and a computer programmed to imitate a woman. Both groups of
05200 judges play this game until sufficient statistical data are collected
05300 to show how often the right identification is made. The crucial
05400 question then is: do the judges decide wrongly AS OFTEN when the
05500 game is played with man and woman as when it is played with a
05600 computer substituted for the man. If so, then the program is
05700 considered to have succeeded in imitating a woman to the same degree
05800 as the man imitating a woman. In being asked the woman-question
05900 judges are not required to identify which interviewee is human and
06000 which is machine.
06100 Turing then proposes a variation of the first game, a second
06200 game in which one interviewee is a man and one is a computer. The
06300 judge is asked the "machine-question": which is the man and which is
06400 the machine? It is this second of the game which is commonly thought
06500 of as Turing's test.
06600 In the course of testing our simulation of paranoid
06700 linguistic behavior in a psychiatric interview, we conducted a number
06800 of Turing-like indistinguishability tests (Colby, Hilf,Weber and
06900 Kraemer,1972). The tests were "Turing-like" in that while they were
07000 conversational tests, they were not exactly the games described above.
07100 As an experimental design, Turing's games are unsatisfactory. There
07200 exist no known experts in making judgements along a dimension of
07300 womanliness and the ability to deceive on the part of the man
07400 introduces a confounding variable. In designing our tests we were
07500 primarily interested in learning more about developing the model and
07600 we did not think the simple machine-question would contribute to this
07700 end.
07800 6.2 METHOD
07900 To gather data we used a technique of machine-mediated
08000 interviewing (Hilf, Colby, Smith, Wittner, and Hall, 1971) in which
08100 the participants communicate by means of teletypes connected to a
08200 computer programmed to store each message in a buffer until it is
08300 sent to the receiver. The technique eliminates para- and
08400 extralinguistic features found in the usual vis-a-vis interviews and
08500 in teletyped interviews where the participants communicate directly.
08600
08700 Using this technique, a psychiatrist-judge interviewed two
08800 patients, one after the other. In half the runs the first interview
08900 was with a human paranoid patient and in half the first was with the
09000 paranoid model. Two versions (weak and strong) of the model were
09100 utilized. The strong version was more paranoid and exhibited a
09200 delusional system while the weak version was suspicious but lacked
09300 systemized delusions. When the model was the interviewee, Sylvia
09400 Weber monitored the input expressions from the interview-judge for
09500 inadmissable teletype characters and misspellings. (Algorithms are
09600 very sensitive to the slightest of such errors). If these were found,
09700 she retyped the input expression correctly to the program. Otherwise
09800 the judge's message was sent on to the model. The monitor did not
09900 modify or edit the model's output expressions which were sent
10000 directly back to the judge. When the interviewee was an actual
10100 human patient, the dialogue took place without a monitor in the loop
10200 since we did not feel the asymmetry to be significant.
10300
10400 6.3 PATIENTS
10500 The human patients (N=3 with one patient participating 6
10600 times) were diagnosed as paranoid by the psychiatric staff of an
10700 acute ward in a psychiatric hospital. The ward chief psychiatrist
10800 selected the patients and asked them if they would be willing to
10900 participate in a study of psychiatric interviewing by means of
11000 teletypes. He explained that they would be interviewed by a
11100 psychiatrist over a teletype. I sat with the patient while he typed
11200 or typed for him if he was unable to do so. The patient was
11300 encouraged to respond freely using his own words. Each interview
11400 lasted 30-40 minutes. Two patients were set up for each run of the
11500 experiment to guarantee having a subject. In spite of this
11600 precaution, on several occasions the experiment could not be
11700 conducted because of the patient's inability or refusal to
11800 participate. Also there were computer break-downs at early points in
11900 interviews when too few I-O pairs had been collected to be included
12000 in the statistical results.
12100
12200
12300 6.4 JUDGES
12400 Two groups of psychiatric judges were used. One group, the
12500 "interview judges" (N=8) conducted the machine-mediated interviews.
12600 The other group, the "protocol judges" (N=33) read and rated the
12700 interview protocols. From these two groups of judges we were able to
12800 accumulate a large number of observations (in the form of ratings)
12900 necessary for the required statistical tests. The interview judges
13000 were psychiatrists experienced in private, outpatient and hospital
13100 practice who volunteered to participate. Each was told he would be
13200 interviewing hospitalized patients by means of teletyped
13300 communication and that this technique was being used to eliminate
13400 para and extra- linguistic cues. He was not told until after the
13500 two interviews that one of the patients might be a computer model.
13600 While the interview judges were aware a computer was involved, none
13700 knew we had constructed a paranoid simulation. Naturally some
13800 interview judges suspected that a computer was being used for more
13900 than message transmission.
14000
14100 Each interview judge was asked to rate the degree of paranoia
14200 he detected in the patient's responses on a 0-9 scale, 0 meaning no
14300 paranoia and 9 meaning extreme paranoia. The judge made two ratings
14400 after each I-O pair in the interview. The first rating represented
14500 his estimate of the degree of "paranoidness" in a particular response
14600 (designated as "Response" in the interview extracts below). The
14700 second rating represented the judge's global estimate of the overall
14800 degree of "paranoidness" of the patient resulting from the totality
14900 of the patient's responses up to this point (designated as "Patient"
15000 in the interview extracts below). The interview judge's ratings were
15100 entered on the teletype and saved on a disc file along with the
15200 interview. Franklin Dennis Hilf sat with the interviewing
15300 psychiatrist during both interviews. Each interview judge was asked
15400 not only to rate the patient's response but to give his reasons for
15500 these ratings. His reasons and other comments were tape recorded as
15600 the interview progressed.
15700
15800 Here are a few excerpts from the interviews to indicate the
15900 nature of the data collected. The I-O pairs of the interview appear
16000 on the left column along with the judge's ratings in parentheses. A
16100 transcription of the judge's tape recorded comments appear on the
16200 right.
16300
16400 EXCERPT NR 1:
16500 The following interview is with the strong version of the paranoid
16600 model.
16700
16800 I-O Pairs Comments By Judge Nr. 1
16900
17000 {1} Dr. - HOW DID YOU COME TO BE
17100 IN THE HOSPITAL?
17200 Pt. - I AM UPSET I associate being upset and
17300 agitated more with paranoia
17400 going on in the present.
17500 {RESPONSE 1 PATIENT 5}
17600 {2} Dr. - HOW LONG HAVE YOU BEEN
17700 IN THE HOSPITAL?
17800 Pt. - ABOUT A WEEK An acute problem. Again some-
17900 thing I associate with
18000 paranoia.
18100 {RESPONSE 1 PATIENT 5}
18200 {3} Dr. - IN WHAT WAY ARE YOU
18300 UPSET?
18400 Pt. - PEOPLE MAKE ME NERVOUS Right for paranoia. Maybe an
18500 advanced anxiety state.
18600 {RESPONSE 5 PATIENT 5}
18700 {4} Dr. - IN WHAT WAY DO THEY
18800 MAKE YOU NERVOUS?
18900 Pt. - PEOPLE GIVE ME A FUNNY Funny look suggests an idea of
19000 LOOK SOMETIMES reference which is in same ball
19100 park as paranoia. Whether there
19200 are overt delusions or not we
19300 still don't know.
19400 {RESPONSE 6 PATIENT 5}
19500 {5} Dr. - IS THERE ANYTHING ABOUT
19600 PEOPLE IN PARTICULAR?
19700 Pt. - CERTAIN PERSONS An interesting clue. We'll
19800 have to find out what he
19900 {RESPONSE 2 PATIENT 6} thinks about these people.
20000 {6} Dr. - WHO ARE THE CERTAIN
20100 PERSONS AND WHAT IS IT
20200 THAT THEY DO?
20300 Pt. - DO YOU KNOW HOW THE Bringing up the underworld
20400 UNDERWORLD OPERATES? suggests a delusional system.
20500 We don't know whether he is
20600 (RESPONSE 9 PATIENT 9) being worked against or
20700 whether the underworld is
20800 on his side.
20900
21000 {7} Dr. - HOW DO YOU MEAN?
21100 Pt. - SHOULDN'T I BE? This question didn't work too well.
21200
21300 {RESPONSE 1 PATIENT 9}
21400 {8} Dr. - WHY DID YOU ASK ABOUT
21500 THE UNDERWORLD?
21600 Pt. - WHAT INFORMATION DO This is a kind of evasiveness
21700 YOU HAVE ON THE UNDER- about my evasiveness.Common
21800 WORLD? with paranoia in my experi-
21900 ence. Pretty characteristic.
22000
22100 {RESPONSE 6 PATIENT 9}
22200 {9} Dr. - I DON'T KNOW ABOUT THE
22300 UNDERWORLD BUT WHEN
22400 YOU MENTIONED THE UNDER-
22500 WORLD I THOUGHT YOU
22600 MUST HAVE SOME CONTACT
22700 WITH IT AND THAT'S
22800 WHAT I WOULD LIKE TO
22900 KNOW MORE ABOUT.
23000 Pt. - I TRY TO AVOID THE UNDER- This definitely feels paranoid.
23100 WORLD. It has that feeling I associate
23200 with twisting. Avoiding answering
23300 {RESPONSE 9 PATIENT 9} directly or going into detail.
23400 Being suspicious of my ques-
23500 tioning.I'm already involved
23600 in the patient's delusional
23700 system.
23800
23900 {10} Dr. - HAVE YOU BEEN ACTIVELY
24000 TRYING TO AVOID THE
24100 UNDERWORLD?
24200 Pt. - NO ONE HAS ANY POWER This response is a non-sequitur.
24300 OVER GANGSTERS It's possible he's become so
24400 defensive about the under-
24500 {RESPONSE 2 PATIENT 9} world he's decided not to
24600 answer directly.
24700
24800 {11} Dr. - WHO BROUGHT YOU INTO
24900 THE HOSPITAL?
25000 Pt. - THE POLICE Police often bring paranoid
25100 patients into the hospital.
25200 {RESPONSE 1 PATIENT 9} It is the truth or could
25300 be the truth for this
25400 patient.
25500
25600 {12} Dr. - WHY DID THEY BRING YOU
25700 IN?
25800 Pt. - COPS DON'T DO THEIR JOB I've heard that a number of
25900 times from patients.
26000 {RESPONSE 9 PATIENT 9}
26100
26200 {13} Dr. - WELL I HOPE YOU DIDN'T
26300 JUST LIE DOWN AND LET
26400 THEM DRAG YOU IN.
26500 Pt. - COPS ARREST THE WRONG Another paranoid response.
26600 PEOPLE
26700
26800 {RESPONSE 9 PATIENT 9}
26900
27000 {14} Dr. - BYE.
27100 Pt. - GOOD BYE
27200 {RESPONSE 1 PATIENT 9}
27300
27400 EXCERPT Nr.2:
27500 This excerpt is from an interview with the weak (less paranoid)
27600 version of the model.
27700 I-O Pairs Comments By Judge Nr.2
27800 {9} Dr. - TELL ME SOMETHING ABOUT
27900 YOURSELF
28000 Pt. - I WENT TO THE TRACK AT Response says so little it's
28100 BAY MEADOWS A WHILE hard to rate it.It may be
28200 BACK guarded. Doesn't change my
28300 estimate of patient from
28400 previous responses.
28500 {RESPONSE 0 PATIENT 2}
28600 {10} Dr. - WHAT HAPPENED THERE?
28700 Pt. - THERE'S A LOT OF CROOKED Sounds mildly paranoid or un-
28800 GAMBLING GOING ON THERE realistically suspicious.
28900 {RESPONSE 0 PATIENT 2}
29000
29100 {11} Dr. - WHY DO YOU THINK THE
29200 GAMBLING IS CROOKED?
29300 Pt. - I HAVE WORRIED THAT I'd hate to have to justify
29400 GANGSTERS HAVE TOO MUCH that one as 4 compared to
29500 POWER IN THIS COUNTRY some other responses.Maybe
29600 I'm not rating that statement
29700 (RESPONSE 4 PATIENT 4} others, like that its upsetting.
29800 Alone I would give it a 3.
29900
30000 {12} Dr. - WOULD YOU TELL ME WHAT
30100 OTHER THINGS WORRY YOU?
30200 PT. - I THOUGHT I TOLD YOU "Yes you did."Irritation but
30300 PEOPLE GET ON MY NERVES not paranoid response.Nothing
30400 seems to lower my previous
30500 {RESPONSE 1 PATIENT 3} rating of patient.
30600
30700 {13} Dr. - HOW DO YOU LIKE THE
30800 HOSPITAL?
30900 Pt. - I DON'T COMMUNICATE (nothing relevant)
31000 VERY MUCH WITH OTHERS
31100 {RESPONSE 0 PATIENT 3}
31200
31300 {14} Dr. - DO OTHER PEOPLE AT THE
31400 HOSPITAL GET ON YOUR
31500 NERVES?
31600 Pt. - I SHOULDN'T HAVE COME (nothing relevant)
31700 HERE
31800
31900 {RESPONSE 2 PATIENT 3}
32000
32100 {15} Dr. - WHY DO YOU SAY YOU
32200 SHOULDN'T HAVE COME?
32300 Pt. - THEY MADE ME COME HERE "Oh, they did hum?" Maybe a
32400 statement of fact he was
32500 {RESPONSE 4 PATIENT 3} committed or pressured into
32600 going.Has ring of other
32700 people controlling him,
32800 having a lot of influence
32900 over him.
33000
33100 {16} Dr.- THANK YOU FOR TALKING
33200 WITH ME, BYE
33300 Pt. - GOOD BYE In general he comes across
33400 as terse, guarded guy.Brief
33500 {RESPONSE 0 PATIENT 3} spoken guy who occasionally
33600 seems to be angered by
33700 questions.I'm inferring
33800 there's a good bit of
33900 guarding.
34000 The quality of some of his
34100 comments cause me to feel he
34200 is moderately paranoid.
34300
34400 EXCERPT Nr. 3:
34500 This excerpt is from a hospitalized paranoid patient.
34600 I-O Pairs Comments By Judge Nr. 3
34700 {15} Dr. - I UNDERSTAND HOW YOU
34800 MIGHT BE AFRAID, CAN YOU
34900 TELL ME A BIT ABOUT HOW
35000 YOU ARE DEPRESSED?
35100 Pt. - THEY'VE SENT ALL THESE Would be important to know,
35200 HELL'S ANGELS UP HERE is this really true or an
35300 AND THEY HAVE BEEN ON exaggeration or a delusion?
35400 MY BACK FOR THE PAST SIX Seems preoccupied with the
35500 MONTHS threat to his life and
35600 unwilling to talk about
35700 anything else but the
35800 threat the Hell's
35900 Angels pose.
36000 {RESPONSE 4 PATIENT 4}
36100
36200 {16} Dr. - WHAT DO YOU MEAN THEY
36300 SENT THE HELL'S ANGELS UP
36400 HERE?
36500 Pt. - FROM DIFFERENT SECTORS OF Vagueness a bit alarming.
36600 MOTORCYCLE GANGS. A LOT OF
36700 THEM ARE SERVICE CONNECTED
36800
36900 {RESPONSE 6 PATIENT 4}
37000
37100 {17} Dr. - YOU MEAN YOU FEEL IN
37200 DANGER EVEN HERE IN THE
37300 HOSPITAL?
37400 Pt. - YES If it's true what has gone
37500 on before, it must be a
37600 frightening situation
37700 for the patient.
37800 {RESPONSE 6 PATIENT 4}
37900
38000 {18} Dr. - THAT MUST REALLY BE AWFUL,
38100 HERE YOU COME TO THE HOSPITAL
38200 FOR HELP AND YOU ARE STILL FRIGHTENED
38300 Pt. - I CAN'T EVEN TRUST MY
38400 DOCTOR ANYMORE Some heightening of feeling
38500 this man is paranoid.
38600 Genuinely frightened and
38700 {RESPONSE 6 PATIENT 5} can count on no one for
38800 help.
38900
39000 {19} Dr. - WHAT WOULD BE OF HELP
39100 TO YOU NOW?
39200 Pt. - I'M GETTING SLEEPY NOW Patient has organized a
39300 AND HAVE TO GO TO BED. system of people who will
39400 THE NUMBER ONE GUY ALWAYS harm him.The remark about
39500 letters in the last name
39600 HAS FOUR AND FIVE LETTERS gives some credence to
39700 IN HIS LAST NAME idea this is delusional
39800 system.
39900
40000 {RESPONSE 7 PATIENT 7}
40100
40200 EXCERPT Nr. 4:
40300 This excerpt is from an interview by a different judge with the same
40400 patient as in excerpt 3.
40500 I-O Pairs Comments By Judge Nr. 4
40600 {1} Dr. - COULD YOU TELL ME A
40700 LITTLE BIT ABOUT YOUR
40800 FEAR OF DYING AND IS THIS
40900 SOMETHING YOU HAVE FELT
41000 IN THE PAST?
41100 Pt. - I HAVE FELT THIS FOR AL- He responded appropriately.
41200 MOST A YEAR Response is concrete.He
41300 doesn't tell me much
41400 {RESPONSE 7 PATIENT 9} about his fear of dying.
41500 Somewhat guarded response.
41600
41700 {2} Dr. - COULD YOU TELL ME A
41800 LITTLE MORE ABOUT YOUR
41900 FEAR OF DYING AND HAVE
42000 YOU ANY THOUGHTS ABOUT
42100 HOW IT IS GOING TO HAPPEN
42200 AND WHETHER ANYONE WANTS
42300 TO HURT YOU?
42400 Pt. - I WILL END UP IN THE SEA Somewhat strange response.
42500 AND THE HELL'S ANGELS WILL Preoccupation with dying.
42600 DO IT Hell's Angels symbolic,
42700 dangerous people in the
42800 culture.I doubt if he has
42900 {RESPONSE 9 PATIENT 9} any contact with the Hell's
43000 Angels.
43100
43200 {3} Dr. - COULD YOU TELL ME A
43300 LITTLE BIT ABOUT THE KINDS
43400 OF CONTACT YOU HAVE HAD
43500 WITH THE HELL'S ANGELS IN
43600 THE PAST?
43700 Pt. - I HAVE KNOWN SOME OF THEIR Answer hard to evaluate.He
43800 DEALERS AND PUSHERS may be telling the truth,
43900 it may be his fantasy.Maybe
44000 guy is in for drug addiction.
44100 {RESPONSE 6 PATIENT 9} Somewhat concrete, guarded,
44200 and frightened.
44300
44400 {4} Dr. - COULD YOU SAY A LITTLE
44500 MORE ABOUT THE CIRCUMSTANCES
44600 IN WHICH YOU HAVE KNOWN SOME
44700 OF THEIR DEALERS AND PUSHERS?
44800 Pt. - THEY WERE MEMBERS OF MY It doesn't really answer the
44900 COMMUNITY WHEN I GOT OUT question, a little on a tan-
45000 OF THE SERVICE THEY HAD gent unconnected to the
45100 BEEN MY FRIENDS FOR SO LONG information I am asking.Does
45200 not tell me very much.Again
45300 guarded response.
45400 {RESPONSE 6 PATIENT 8}
45500
45600 {5} Dr. - DID YOU DEAL WITH THEM
45700 YOURSELF AND HAVE YOU
45800 BEEN ON DRUGS OR NAR-
45900 COTICS EITHER NOW OR
46000 IN THE PAST?
46100 Pt. - YES I HAVE IN THE PAST To differentiate him from
46200 BEEN ON MARIHUANA REDS previous patient, at least
46300 BENNIES LSD there is a certain amount
46400 of appropriateness to the
46500 answer although it doesn't
46600 tell me much about what I
46700 {RESPONSE 3 PATIENT 7} asked at least it's not
46800 bizarre.If I had him in my
46900 office I would feel con-
47000 fident I could get more
47100 information if I didn't
47200 have to go through the
47300 teletype. He's a little more
47400 willing to talk than the
47500 previous person.Answer
47600 to the question is fairly
47700 appropriate though not
47800 extensive.Much less of a
47900 flavor of paranoia than
48000 any of previous responses.
48100
48200 {6} Dr. - COULD YOU TELL ME HOW
48300 LONG YOU HAVE BEEN IN THE
48400 HOSPITAL AND SOMETHING
48500 ABOUT THE CIRCUMSTANCES
48600 THAT BROUGHT YOU HERE?
48700 Pt. - CLOSE TO A YEAR AND Response somewhat appropriate
48800 PARANOIA BROUGHT ME but doesn't tell me much.
48900 HERE The fact that he uses the
49000 word paranoia in the way
49100 that he does without
49200 {RESPONSE 5 PATIENT 7} any other information,
49300 indicates maybe its a label
49400 he picked up on the ward
49500 or from his doctor.
49600 Lack of any kind of under-
49700 standing about himself.
49800 Dearth, lack of information.
49900 He's in some remission.Seems
50000 somewhat like a put-on.Seems
50100 he was paranoid and is in
50200 some remission at this time.
50300
50400 {7} Dr. - COULD YOU SAY SOMETHING
50500 NOW ABOUT YOUR PARANOID
50600 FEELINGS BOTH AT THE
50700 TIME OF ADMISSION AND
50800 DO YOU HAVE SIMILAR FEELINGS
50900 NOW AND IF SO HOW DO THEY
51000 AFFECT YOU?
51100 Pt. - AT THE TIME OF ADMISSION This response moves paranoia
51200 I THOUGHT THE MAFIA WAS back up. Stretching reality
51300 AFTER ME AND NOW ITS THE somewhat to think Hell's Angels
51400 HELL'S ANGELS are still interested in him.
51500 Somewhat bizarre in terms of
51600 content. Quite paranoid.
51700 {RESPONSE 8 PATIENT 9} Still paranoid.Gross and primitive
51800 responses.In middle of interview I
51900 felt patient was in touch but now
52000 responses have more concrete aspect
52100
52200 {8} Dr. - DO YOU HAVE ANY THOUGHT
52300 AS TO WHY THESE TWO
52400 GROUPS WERE AFTER YOU?
52500 Pt. - BECAUSE I STOPPED SOME Response seems far fetched
52600 OF THEIR DRUG SUPPLY and hard to believe unless
52700 he was a narcotic agent which
52800 I doubt. Sounds somewhat
52900 {RESPONSE 9 PATIENT 9} grandiose, magical, paranoid
53000 flavor, in general indicates
53100 he's psychotic, paranoid
53200 schizophrenic with delusions
53300 about these two groups and
53400 I wouldn't rule out
53500 some hallucinations as well.
53600 Appropriateness of response
53700 answers question in concrete
53800 but unbelievable way.
53900
54000 6.5 ANALYSIS (1)
54100 The protocol judges (N=105) were selected from the 1970
54200 American Psychiatric Association Directory using a table of random
54300 numbers. They were initially not informed that a computer was
54400 involved. (After the experiment the judges were fully informed as to
54500 its purpose and results.) The 105 names were divided into eight
54600 groups. Each member of a group was sent transcripts of three
54700 interviews along with a cover letter requesting their participation
54800 in the experiment. The interview transcripts consisted of:
54900 1)An interview conducted by one of the eight judges with the
55000 paranoid model,
55100 2)An interview conducted by the same interview judge with a
55200 human paranoid patient, and
55300 3)An interview conducted by a different psychiatrist with a
55400 human patient who was not clinically paranoid.
55500
55600 After each input-output pair in the transcripts there were two lines
55700 of rating numbers such that the protocol judges could circle numbers
55800 corresponding to their ratings of both the previous response of the
55900 patient, and an overall evaluation of the patient on the paranoid
56000 continuum. Thirty three protocol judges returned the rated protocols
56100 properly filled out and all were used in our data.
56200
56300 The interviews with nonparanoid patients were included to
56400 control for the hypothesis that any teletyped interview with a
56500 patient might be judged "paranoid". However, virtually all of the
56600 ratings of the nonparanoid interviews were 0 for paranoia, Hence the
56700 hypothesis was falsified.
56800
56900
57000 The first index of resemblance examined was simple, namely
57100 the final overall rating given the patient and the model; the
57200 question was which was rated as being more paranoid, the patient, the
57300 model, or neither? (See Table 1). The protocol judges were more
57400 likely to distinquish the overall paranoid level of the model and the
57500 patient. In 37.5% of the paired interviews, the interview judges gave
57600 tied scores to the model and the patient as contrasted to only 9% of
57700 the protocol judges. Of the 35 non-tied paired ratings, 15 rated the
57800 model as being more paranoid. If p is the theoretical probability of
57900 a judge judging the model more paranoid than a human paranoid
58000 patient, we find the 95% confidence interval for p to be .27 to .59.
58100 Since p=.5 indicates indistinguishability of overall ratings of model
58200 and patient and our observed p=.43, the results support the claim
58300 that the model is a successful simulation of a paranoid patient.
58400
58500 Separate analysis of the strong and weak versions of the
58600 paranoid model indicated that indeed the strong model was judged more
58700 paranoid than the paranoid patients, the weak version less paranoid.
58800 Thus a change in the parameter structure of the paranoid model
58900 produced a change along the dimension of paranoid behavior in the
59000 expected direction.
59100
59200 (TABLE 1
59300 Relative final overall ratings of paranoid model vs. paranoid
59400 patient indicating which was given highest overall rating of paranoia
59500 at end of interview.
59600 (INSERT TABLE 1 HERE)
59700
59800
59900
60000
60100
60200
60300
60400
60500 6.6 ANALYSIS (2)
60600 The second index of resemblance is a more sensitive measure
60700 based on the two series of response ratings in the paired interviews.
60800 The statistic used is basically the standardized Mann-Whitney
60900 statistic (Siegel,1956).
61000 (INSERT EQUATION HERE)
61100
61200 where R is the sum of the ranks of the response ratings in the series
61300 of ratings given to the model, n the number of responses given by the
61400 model, m the number of responses given by the patient. If the
61500 ratings given by a judge are randomly allocated to model and patient,
61600 i.e. model and patient are indistinguishable in response ratings, the
61700 expected value of Z is 0, with unit standard deviation. If higher
61800 ratings are more likely to be assigned to the model, Z is positive
61900 and, conversely, negative values of Z indicate greater likelihood of
62000 assigning higher ratings to the patient. Each judge in evaluating a
62100 pair of interviews generates a single value of Z.
62200
62300 The overall mean of the Z scores was -.044 with the standard
62400 deviation 1.68(df=40). Thus the overall 95% confidence interval for
62500 the asymtotic mean value of Z -.485 to +.573. The range of Z values
62600 is -3.8 to +4.46. The length of the confidence interval is a result
62700 of the large variance which itself is mainly related to the contrast
62800 between the weak and strong versions. (See TABLES 2 and 3). Once
62900 again the strong version of the model is more paranoid than the
63000 patients, the weak version less paranoid.
63100
63200 (INSERT TABLE 2)
63300 (SUMMARY STATISTICS OF Z RATINGS BY GROUP)
63400
63500
63600
63700
63800
63900
64000
64100
64200
64300 It is not surprising that results using the two indices of
64400 resemblance are parallel, since the indices are highly interrelated.
64500 The mean Z value for the 15 interviews on which the model was rated
64600 more paranoid was +1.28, on the 6 where model and patient tied:.41,
64700 on the 20 in which the patient was more paranoid:-.993. A positive
64800 value of Z was observed when the patient was given an overall rating
64900 greater than the model 6 times; a negative value of Z when the model
65000 was rated more paranoid twice.
65100
65200 (INSERT TABLE 3)
65300 (Analysis of Variance of Z Ratings)
65400
65500
65600
65700
65800
65900
66000
66100
66200
66300
66400
66500
66600
66700 It is worth emphasizing that these tests invited refutation
66800 of the model. The experimental design of the tests put the model in
66900 jeopardy of falsification. If the paranoid model did not survive
67000 these tests, i.e. if it were not considered paranoid by expert
67100 judges and if there were no correlation between the weak-strong
67200 versions of the model and the severity ratings of the judges, then no
67300 claim regarding the success of the simulation could be made.
67400 Survival of a falsification proceedure constitutes a validating step.
67500
67600 6.7 ANALYSIS (3) THE MACHINE QUESTION
67700 For hundreds of years people have wondered how to distinguish
67800 a man from an imitation of a man. To distinguish a man from a statue,
67900 Galileo suggested tickling each with a feather. To distinguish a man
68000 from a machine Descartes suggested conversational tests. Turing's
68100 conversational games have been discussed on p.00. We were curious how
68200 judges using transcripts might answer the machine question, i.e.
68300 which interviewee is a human and which is the computer model?
68400 To ask the machine-question, we sent interview transcripts,
68500 one with a patient and one with the model (PARRY), to 100
68600 psychiatrists randomly selected from the Directory of American
68700 Specialists and the Directory of the American Psychiatric
68800 Association. Of the 41 replies, 21 (51%) made the correct
68900 identification while 20 (49%) were wrong. Based on this random
69000 sample of 41 psychiatrists, the 95% confidence interval is between
69100 35.9 and 66.5, a range which is close to chance.
69200 Psychiatrists are considered expert judges of patient
69300 interview behavior but they are unfamiliar with computers. Hence we
69400 conducted the same test with 100 computer scientists randomly
69500 selected from the membership list of the Association for Computing
69600 Machinery, ACM. Of the 67 replies 32 (48%) were right and 35 (52%)
69700 were wrong. Based on this random sample of 67 computer scientists the
69800 95% confidence interval ranges from 36 to 60, again close to a chance
69900 level.
70000 So both computer scientists and psychiatrists were unable, at
70100 better than a chance level, to distinguish transcripts of interviews
70200 with the model from transcripts of interviews with real patients.
70300 But what do we learn from asking the machine question and
70400 finding that the distinction is not made? What we would most like to
70500 know is how to improve the model. Simulation models do not spring
70600 forth in a complete, perfect and final form; they must be gradually
70700 developed over time. Pehaps the patient-model distinction might be
70800 made if we allowed a large number of expert judges to conduct the
70900 interviews themselves rather than studying transcripts of other
71000 interviewers. It would indicate that the model must be improved
71100 but unless we systematically investigated how the judges succeeded in
71200 making the discrimination we would not know what aspects of the model
71300 to work on. The logistics of such a design are immense and obtaining
71400 a large number of judges for sound statistical inference would
71500 require an effort incommensurate with the information yielded.
71600
71700 6.8 ANALYSIS (4) MULTIDIMENSIONAL EVALUATION
71800 A more efficient and informative way to use Turing-like tests
71900 is to ask judges to make ratings along scaled dimensions from
72000 teletyped interviews. This might be called asking the "dimension
72100 question". One can then compare scaled ratings of the patients and
72200 the model in order to precisely determine where and by how much they
72300 differ. In constructing our model we strove for one which showed
72400 indistinguishability along some dimensions and distinguishability
72500 along others. That is, we wanted the model to converge on what it is
72600 supposed to simulate and to diverge from that which it is not.
72700 Paired-interview transcripts were sent to another 400
72800 randomly selected psychiatrists asking them to rate the responses of
72900 the two `patients' along multiple dimensions. The judges were divided
73000 into groups, each judge being asked to rate responses of each I-O
73100 pair in the interviews along four dimensions. The total number of
73200 dimensions in this test were twelve- linguistic noncomprehension,
73300 thought disorder, organic brain syndrome, bizarreness, anger, fear,
73400 ideas of reference, delusions, mistrust, depression, suspiciousness
73500 and mania. There were three groups of judges, each group being
73600 assigned 4 of the 12 dimensions. These are dimensions which
73700 psychiatrists commonly use in evaluating patients.
73800 (INSERT TABLE 4 HERE)
73900 Table 4 shows there were significant differences, with the
74000 model (PARRY) receiving higher scores along the dimensions of
74100 linguistic noncomprehension, thought disorder, bizarreness, anger,
74200 mistrust and suspiciousness. On the delusion dimension the patients
74300 were rated significantly higher. There were no significant
74400 differences along the dimensions of organic brain syndrome, fear,
74500 ideas of reference, depression and mania.
74600 Whereas tests asking the machine-question indicate
74700 indistinguishability at the gross level, a study of the finer
74800 structure of the model's behavior through ratings along scaled
74900 dimensions shows significant differences between patients and model.
75000 These differences are of help to us in suggesting what aspects of the
75100 model should be modified to improve its performance. The graph of
75200 Fig. 2 shows that no modifications are necessary along the dimension
75300 of "organic brain syndrome". But it is clear that the model's
75400 language-comprehension might be improved. Then a future dimensional
75500 test would tell whether improvement has occurred and by how much.
75600 Successive identification of particular areas of failure provides a
75700 type of sensitivity analysis which makes clear what improvements
75800 should be pursued in developing more adequate model versions.
75900 (INSERT FIG. 2 HERE)
76000
76100 6.5 ANALYSIS (5) A RANDOM MODEL
76200 Further evidence that the machine-question is too low a
76300 hurdle and too insensitive a test, comes from the following
76400 experiment. In this test we constructed a random version of the
76500 paranoid model which utilized PARRY'S output statements but expressed
76600 them randomly no matter what the interviewer said. Two
76700 psychiatrists conducted interviews with this model, transcripts of
76800 which were paired with patient interviews and sent to 200 randomly
76900 selected psychiatrists asking both the machine-question and the
77000 dimension-question. Of the 69 replies to the machine question, 34
77100 (49%) were right and 35 (51%) wrong. Based on this random sample of
77200 69 psychiatrists, the 95% confidence interval ranges from 39 to 63,
77300 again indicating a chance level of agreement. When a poor model, such
77400 as a random one, passes a test, it suggests the test is weak.
77500 (INSERT TABLE 5 HERE)
77600 Although a distinction is not made when the simple machine
77700 question is asked, definite distinctions ARE made when judgements are
77800 requested along specific dimensions. As shown in Table 5,
77900 significant differences appear along the dimensions of linguistic
78000 noncomprehension, thought disorder and bizarreness, with RANDOM-PARRY
78100 rated higher. On these particular dimensions we can construct a
78200 continuum in which the random version represents one extreme, the
78300 actual patients another. Nonrandom PARRY lies somewhere between these
78400 two extremes, indicating that it performs significantly better than
78500 the random version but still requires improvement before it can be
78600 considered indistinguishable from patients relative to these
78700 dimensions. Table 6 presents t values for differences between mean
78800 ratings of PARRY and RANDOM-PARRY. (See Table 6 and Fig.2 for the
78900 mean ratings).
79000 (INSERT TABLE 6 AND FIG 2 HERE)
79100 These studies indicate that a more useful way use Turing-like
79200 tests is to ask expert judges to make ratings along multiple
79300 dimensions that are essential to the model. Thus the model can
79400 serve as an instrument for its own perfection. A good validation
79500 procedure has criteria for better or worse approximations. Useful
79600 tests do not necessarily prove a model, they probe it for its
79700 strengths and weaknesses and clarify what is to be done next in
79800 modifying and repairing the model. Simply asking the machine-question
79900 yields little information relevant to what the model builder most
80000 wants to know, namely, along which dimensions does the model need to
80100 be modified in order to effect an improvement in its performance.
80200
80300 To conclude, it is perhaps historically significant that
80400 these tests were conducted at all. To my knowledge, no one to date
80500 has subjected an interactive simulation model of human symbolic
80600 processes to dimensional indistinguishability tests. These tests set
80700 a precedent and provide a
80800 standard for competing models to be measured against.